Skip to main content

Creating and modifying a dataset

You can create a standard wide-format data set by

  1. creating a link to a database using the require command (only needs to be done once per script),
  2. creating an empty dataset using the create-dataset command,
  3. and importing at least one variable into the empty dataset using the import command.

Unless you have special needs, it is recommended to connect to the latest version of the relevant database. You then get access to all the latest variables and the latest updates. The version number can be found by looking at the top left of the variable overview.

You can only import one variable at a time when creating a wide-format data set through the import command. This does two things:

  • Retrieves data observations for a given time (measurement time is not specified for fixed information such as gender)

  • Links the data to the current population via a unique built-in unit identifier series (on first import, all observations for the given time are retrieved)

TIP

It is possible to override the so-called left-join principle by using the import option outer_join. Then you will instead retrieve all observations for the given time, also for those that are not already in the data set population. This can be useful if you want to retrieve data on all individuals over a longer period of time (through repeated measurements for a given variable), and not just for those who had an observation at the first measurement time. Chapter 2.3.1 in the User Guide explains more about this.

After the dataset is created, it can be modified as needed. For example, you can rename datasets or variables, remove variables, or remove observations.

Example:

 require no.ssb.fdb:23 as db

create-dataset demography
import db/BEFOLKNING_KJOENN as gender
import db/BEFOLKNING_FOEDSELS_AAR_MND as birthdate
import db/SIVSTANDFDT_SIVSTAND 2020-01-01 as sivstatus
import db/INNTEKT_BRUTTOFORM 2020-01-01 as wealth

// Rename variables by adding a year reference
rename sivstatus sivstatus20
rename wealth wealth20

// Deleting the variable gender from the dataset
drop gender

// Keeping only married persons in the dataset
keep if sivstatus20 == '2'